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Abstract. Research on methods of meta-analysis (the synthesis of re- 
lated study results) has dealt with many simple study indices, but less 
attention has been paid to the issue of summarizing regression slopes. 
In part this is because of the many complications that arise when real 
sets of regression models are accumulated. We outline the complexities 
involved in synthesizing slopes, describe existing methods of analysis 
and present a multivariate generalized least squares approach to the 
synthesis of regression slopes. 
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We begin with a discussion of the rationale for 
summarizing regression slopes, a practice that has 
become more prevalent in meta-analyses in recent 
years. We then examine the methods for summariz- 
ing slopes that have been proposed to date, and the 
assumptions and data requirements of those meth- 
ods. We conclude by presenting a generalized least 
squares (GLS) approach to the synthesis of regres- 
sion slopes for continuous predictors and outcomes, 
with remarks on the challenges and limitations to 
synthesis of such estimates. 

1. SYNTHESIZING SLOPES 

While it is by no means common or well under- 
stood, the synthesis of regression slopes has received 
increased attention in recent years (e.g.. Baker et al. 
(2003); Peterson and Brown (2005); Roberts (2005); 



Betsy Jane Becker is Professor of Measurement and 
Statistics, College of Education, Florida State 
University, Tallahassee, Florida 32306, USA e-mail: 
bbecker@fsu.edu. Meng-Jia Wu is Assistant Professor of 
Research Methodology, School of Education, Loyola 
University Chicago, Chicago, Illinois 60611, USA 
e-mail: mwu2@luc.edu. 



This is an electronic reprint of the original article 
published by the Institute of Mathematical Statistics in 
Statistical Science, 2007, Vol. 22, No. 3, 414-429. This 
reprint differs from the original in pagination and 
typographic detail. 



Rose and Stanley (2005)). This growing interest is 
likely related to the increasingly complex models in- 
vestigated in primary research, at least in the social 
sciences. Researchers want to model the effects of 
multiple predictors as well as to control for poten- 
tial confounding variables, and in the context of a 
primary study this is often achieved by including 
such variables in complex models. Results of tech- 
niques like structural equation modeling, hierarchi- 
cal linear modeling and multiple regression have of- 
ten been omitted from meta-analyses because of a 
lack of knowledge about how to synthesize indices 
from these analyses, and because of the complexities 
and assumptions underlying the process of synthe- 
sis. 

The main purposes of this paper are to point out 
the complexities and potential problems in synthe- 
sizing slopes from regression models, to describe ex- 
isting methods for summarizing slopes and to present 
a new synthesis approach based on generalized least 
squares estimation. We focus only on the case of 
multiple regression, though clearly other analyses 
involve regression-like models with similar assump- 
tions. We begin with the simple case where all stud- 
ies examine very similar models and discuss tech- 
niques for estimating a combined regression model 
across studies. Modeling to examine the impact of 
study features, design differences and study quality 
(e.g.. Pang, Drummond and Song, 1999) is touched 
on briefly. Other complications such as publication 
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bias (e.g., Doucouliagos (2005); Stanley (2005)) are 
beyond the scope of our discussion. 

Consider a model in study i relating some predic- 
tors Xi through Xp to an outcome Y for case j. 
Specifically, in study i, 

(1) Yij = Pio + PiiXiji H h PipXijp + Cij 

for J = 1 to rii cases. The usual assumptions of nor- 
mality and homoscedasticity of errors apply such 
that Cij ~ N{0,af), and linearity of the X-Y rela- 
tions is also assumed within each study. Often in a 
synthesis one predictor (let us say Xi) is of primary 
interest; below we refer to this as the focal predic- 
tor. Now assume we have a series of studies i = 1 to 
k, and each of them involves a regression with Xi 
as a predictor of Y; typically also other predictors 
(say, X2 through Xp) appear in these studies. We 
may wish to summarize the slopes representing the 
relation of Xi to y (estimates of (3ii through (3ki), 
and on occasion perhaps to summarize all P slopes 
for Xi through Xp, across the k studies. 

While syntheses of slopes imply a variety of fairly 
stringent assumptions, this has not deterred 
researchers from combining regression slopes (though 
some existing summaries have been done without re- 
gard to the underlying assumptions). Crouch (1995, 
1996) summarized slopes from a diversity of mod- 
els representing tourism demand and Lau and col- 
leagues (Lau, Sigelman, Heldman and Babbitt, 1999) 
used regressions to examine the effectiveness of neg- 
ative political advertisements. Farley, Lehmann and 
Sawyer (1995) encouraged marketing researchers to 
synthesize regression slopes, and more recently 
Peterson and Brown (2005) reviewed the use and 
synthesis of standardized slopes in meta-analysis in 
the field of psychology. Two controversial and very 
different syntheses of regression results in education 
dealt with the topic of whether educational expen- 
ditures relate to achievement outcomes (Hanushek 
(1989); Hedges, Laine and Greenwald, 1994). A re- 
cent issue of the Journal of Economic Surveys (e.g., 
Roberts (2005); Rose and Stanley (2005)) focused 
exclusively on meta-analyses of regression coefficients 
on a variety of economic topics, and many others 
have synthesized regressions on diverse topics in eco- 
nomics (e.g.. Card and Krueger (1995); Doucoulia- 
gos and Paldam (2006)), in large part thanks to the 
seminal work of Stanley and Jarrell (1989). 

In spite of their widespread use in economics, meth- 
ods for summarizing regression slopes have received 
less attention in the statistical literature than 



methods for synthesizing other indices used in meta- 
analysis such as standardized mean differences, cor- 
relations, and proportions (or transformed propor- 
tions such as odds ratios). Analytic approaches may 
be proposed in the methods sections of substantive 
syntheses without much attention to the statisti- 
cal behavior of the estimators and tests involved. 
In this article we provide a multivariate formulation 
for the synthesis of slopes, beginning with discus- 
sions of the assumptions required and of problems 
that meta-analysts may encounter when synthesiz- 
ing slopes. We then briefly review several existing 
univariate and multivariate approaches. Most exist- 
ing approaches are univariate, which avoid some, but 
not all, of the issues and assumptions that underlie 
the synthesis of sets of regression slopes. Other ap- 
proaches to combining slopes are more complex, but 
require access to raw data which is quite unusual to 
have in the meta-analysis context. 

2. ASSUMPTIONS AND PROBLEMS IN 
SYNTHESIZING SLOPES 

The synthesis of regression slopes is difficult for 
several reasons, and a variety of problems must be 
dealt with in the process. Problems include nonequiv- 
alence of the metrics for the predictors and outcomes 
across studies, lack of information in study reports 
and the estimation of very diverse models across 
studies. Slopes are identically distributed across stud- 
ies when the outcome Y and the focal predictor X 
are measured similarly, when the same additional 
Xs appear in each study (i.e., the same model is es- 
timated in each study), and when X and Y scores 
are similarly distributed. Each of these conditions is 
often not met across studies, which is a concern for 
the meta-analysis of slopes. We consider each con- 
dition in turn. 

2.1 Y Is Measured Similarly Across Studies 

This consideration is important because even if 
only a single predictor appears in each of a collec- 
tion of k regression equations, the raw regression 
slope in each study depends on the scales of that 
predictor and the outcome. This is evident in the 
language commonly used to describe the raw regres- 
sion slope — the predicted change in the outcome Y 
given one unit change in X. Also this is easily seen 
in the formula for the slope in a bivariate regression, 
which is h = rxY^Sy / Sx)- Here rxY is the correla- 
tion between X and Y, Sy is the standard deviation 
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of the Y scores and Sx is the standard deviation of 
the X scores. For two raw-scale slopes to be com- 
parable across studies, the scales of Y and X must 
be the same (or proportional, e.g., both X and Y 
could be linearly transformed using the same trans- 
formation). Indeed, for total equivalence of scales, 
the measures of Y and X should be equally reliable 
across studies, which is rare (Amemiya and Puller 
(1984); Hunter and Schmidt (2004)). 

We consider an example where y is a measure of 
the quality of teaching — often represented as stu- 
dent achievement. Our examples are drawn from 
an ongoing synthesis of studies of the relationship 
of teacher qualifications to measures of the quality 
of teaching. Across studies student achievement is 
typically measured using different tests of different 
constructs (math, reading, etc.), which may be pre- 
sented as posttests, difference scores or other mea- 
sures of change over time and the like. We have 
identified over 190 studies that examine measures of 
student learning and to date, from 65 studies with 
measures coded in detail, we have identified 79 dif- 
ferent measures of student learning (and coding is 
not complete). At least in this synthesis, Y is not 
measured similarly across studies. 

In some areas, particularly in economics where 
outputs may be monetary, outcomes will be mea- 
sured similarly or can be transformed or adjusted 
to be reasonably similar. For instance, Ashenfelter, 
Harmon and Oosterbeek (1999) examined returns 
to schooling, where the outcome was earnings, and 
earnings scales can be reasonably well equated across 
countries and over time. However, in many areas this 
will not be possible. 

2.2 Focal X Is Measured Similarly Across 
Studies 

This is also a problematic assumption. Again in 
some realms, such as the study of economic inputs 
measured in dollars or other forms of currency, this 
may not be an issue (e.g., per pupil expenditures 
were examined by Hedges, Laine, and Greenwald 
(1994)). In the Ashenfelter, Harmon and Oosterbeek 
(1999) review, schooling was apparently measured in 
years, which would also be comparable across stud- 
ies. Even in such cases, however, adjustments (e.g., 
for inflation, for exchange rates) will sometimes be 
required. Also when the index of study results is 
an elasticity (common in economics) and represents 
proportional change in X and Y, the scale of X may 
not be as critical. In other areas, however, the focal 



X's may not be measured similarly. Shi and Copas 
(2004) noted that exposure (dose) variables are of- 
ten measured categorically in medical dose-response 
studies. They referred to the problem of having such 
categorizations (which can vary across studies) as 
the problem of "grouped dose levels." 

In our synthesis of the literature on teacher quali- 
fications, studies examine such predictors as degrees 
earned, counts of courses taken, numbers of credits 
taken, performance on teacher tests and teaching 
experience. Some of these (e.g., counts of courses 
taken) may be measured fairly similarly across most 
studies, while others (teacher test performance) are 
not. Even such things as teaching experience are not 
always measured as ratio-scale, continuous variables 
(e.g., years of experience). We have found such vari- 
ations as dichotomies representing novice versus ex- 
perienced teachers, categorical representations (e.g., 
teachers with 0-5 years, 6~10 years, or more than 
10 years experience) and years transformed to rep- 
resent nonlinear effects (e.g., squared years of expe- 
rience) . 

2.3 Same Additional X's Across Studies 

This condition is virtually never met. In practice, 
studies nearly always estimate different models. In 
fact it can be argued that differences in the models 
analyzed should be expected across studies, as re- 
searchers develop and elaborate on models present 
in the literature in attempts to refine prediction and 
to explain additional variation in the outcomes of 
interest. Stanley and Jarrell (1989) raised a concern 
about differences in models in the context of model 
specification, and argued that syntheses of regres- 
sion results should examine aspects of models such 
as the functional form of the variables involved and 
differences in the independent variables included in 
the regressions. Many economists have dealt with 
this issue by modeling regression slopes or other in- 
dices of effect as functions of dummy variable pre- 
dictors that represent differences in model specifica- 
tion (e.g., Doucouliagos and Paldam (2006); Stanley 
(2001)). 

Some examples of how models can vary widely 
come from the literature on teacher qualifications 
and the quality of teaching. Wu and Becker (2004) 
examined regression models of the impact of teacher 
experience on student outcomes based on two large- 
scale survey data sets: the Coleman Equality of Ed- 
ucational Opportunity (EEO) data (Coleman et al. 
(1966)) and the National Education Longitudinal 
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Study: 1988 (NELS:88) (e.g., Ingels et al. (1992)). 
Wu and Becker found 92 different models for the 
prediction of student achievement in 12 studies us- 
ing the EEO data set. Nine of those had examined 
teacher experience. Similarly, 55 different models ap- 
peared in the 11 studies based on the NELS:88 data 
set; 6 of those models had examined teacher expe- 
rience. More critically, other than teacher experi- 
ence, the 9 models using the EEO data set together 
contained 122 different additional independent vari- 
ables, and the 6 models based on the NELS:88 data 
set contained 103 other independent variables. The 
regression models contained a diversity of additional 
variables, including socioeconomic status, teacher 
salary, teacher /pupil ratio, school characteristics, stu- 
dent and family characteristics, and the like. 

The question of whether models are similar across 
studies is important because the metric of the raw 
slope for X depends on both the outcome {Y) and 
X, and because of model specification issues. In par- 
ticular, each slope's precision, degree of bias and co- 
variation with other slopes depend on the other X's 
in the model. To the extent that a model is not prop- 
erly specified, all slopes in the model are potentially 
biased. 

Also how intercorrelated the slopes are (i.e., the 
degree of multicollinearity) depends on what Xs are 
included. The covariance matrix Cov(bj), where bj 
is the vector of slopes for study i, contains this in- 
formation. [Notation and formulas for Cov(bj) are 
introduced below.] One simple example suffices to 
make this point: Consider the slope for Xi when 
there is only one additional X in the model (say 
X2). The correlation between the slopes for X\ and 
X2 [i.e., CoTr{hxi,bx2)] is the opposite of the bi- 
variate correlation Corr(Xi, X2) between Xi and X2 
(Stapleton (1995)). When additional X's are added 
the slope covariances depend on the partial correla- 
tion between Xi and X2, controlling for other X's. 
Even this simple fact reveals that each slope's dis- 
tribution depends on other predictors in the model. 
However, in practice, the covariance matrix among 
the slopes in primary studies is rarely reported 
(though matrices of correlations among predictors 
are sometimes reported). So it will be unusual to 
find full Cov(bi) matrices in published studies, and 
in such cases caution may be needed in synthesizing 
slopes from very different models. 

The extent to which differences in the models esti- 
mated across studies lead to important differences in 



slopes across studies is unclear. Therefore, the ques- 
tion of model specification is relevant here. If all 
of the different versions of models are (reasonably) 
well specified, then each one should provide unbiased 
and relatively independent estimates of the regres- 
sion slopes. The impact of model differences likely 
depends on both model specification and on the rela- 
tionships of the focal X to the additional variables. 
Let us consider the focal predictor or some "base 
set" of predictors that appear in a well specified 
model. If additional variables are relatively indepen- 
dent of the predictors in the base set, the slopes of 
the base set of predictors (and their distributions) 
may not be much affected by the addition of those 
new variables. However to the extent that added 
variables are highly correlated with the base pre- 
dictors or with the outcome, the slopes of the base 
predictors will differ and will also be biased. We 
suspect that there will be some limitations to the 
application of the estimation approach shown here 
when the models used across different studies dif- 
fer widely, and in particular when some suffer from 
multicollinearity or other forms of misspecification. 

Some empirical investigations have attempted to 
shed light on the role of additional primary-study 
predictors on regression slopes. Peterson and Brown 
(2005) found no impact of either sample size or the 
number of additional predictors in regression models 
on the relation between the standardized slope and 
a corresponding zero-order correlation. Their analy- 
ses included slopes from an incredibly wide range of 
areas and encompassed different predictors and out- 
comes from studies in psychology, sociology, market- 
ing and management. It is possible that by looking 
across so many diverse regression models the im- 
pact of the nature of the models would be diluted. 
Ashenfelter and colleagues (1999) attempted a more 
nuanced investigation in their review of studies of 
returns to schooling: they assessed the importance 
of the presence of controls for ability and measure- 
ment error in the primary-study regressions. Their 
analyses suggested a complicated pattern of impact 
of ability controls, with effects for returns to school- 
ing in the United States increasing when ability was 
controlled and effects in non-U. S. studies decreas- 
ing. In contrast, the inclusion of controls for mea- 
surement error did not appear to significantly affect 
the slopes. 

An analysis by Doucouliagos and Paldam (2006) 
examined models for the effects of economic devel- 
opment aid on the accumulation of capital in the 
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countries that receive such aid. To explore differ- 
ences in sample and model specification, their analy- 
ses examined aid-effectiveness elasticities as the out- 
come and included as many as 11 dummy variable 
predictors that represented study differences. These 
dummy variables represented the type of model ex- 
amined in the primary study (e.g., fiscal response 
models versus growth equations), the nature of the 
data set used (its type and countries included) and 
the presence of three different control variables. Con- 
trols for endogeneity and the model type variables 
had significant impacts on the elasticities, as did 
sample size. In this analysis, differences in the forms 
of the models examined in the primary research 
played a large role in the synthesis results. 

3. EXISTING METHODS FOR SUMMARIZING 
SLOPES 

Next we examine methods that have been pro- 
posed for synthesizing regression slopes. These tech- 
niques have been described in the literature, but 
some do not appear to have been used in meta- 
analytic practice. 

3.1 Summaries of Slopes or Functions of Slopes 

Several authors have used direct and simple sum- 
maries of slopes or differences in slopes, although 
none has provided a clear statistical justification 
for the approaches used. Jarrell and Stanley (1990) 
used slopes for a dummy variable that represented 
union membership (from regression models predict- 
ing log wage values) in a review of the differences 
in wages between union and non-union workers. In 
a similar analysis, Stanley and Jarrell (1998) exam- 
ined the gender gap in wages. Using ordinary least 
squares (OLS) regression analyses, Jarrell and Stan- 
ley examined two models for the wage gap due to 
union membership. One had 20 predictors represent- 
ing differences in sample and model specification, 
and the other included 77 predictors. The initial 
20 predictors represented differences in model spec- 
ification such as the nature of the wage variables 
used and differences in the samples analyzed (e.g., 
whether blue-collar, white-collar or government 
workers were included). The other model included 
those 20 predictors, plus allowed those predictors 
to vary over time, and also included 10 indicators 
identifying particular data sets that were used and 
27 indicator variables representing multiple findings 
contributed by 27 primary-study authors. 



Jarrell and Stanley applied OLS in spite of ac- 
knowledging that the errors in their model were likely 
to be heteroscedastic, noting that "[vjarious efforts 
to adjust for the problem made little difference in 
this application" (Jarrell and Stanley (1990), page 
56). Similarly Stanley and Jarrell used OLS meth- 
ods and tested for homoscedasticity using "conven- 
tional tests" (Stanley and Jarrell (1998), page 961). 
It is not clear why these authors did not find het- 
eroscedasticity, unless their results arose from 
roughly equal sized samples, because as will be shown 
below, slopes will typically not have equal variances 
across studies and thus errors from models with 
slopes as outcomes will typically not be homoscedas- 
tic either. 

3.2 Summaries of t Statistics 

Stanley and Jarrell (1989) encouraged economists 
to summarize regression slopes and suggested using 
the t statistic (i.e., the slope divided by its standard 
error) as an index. They suggested this metric as a 
way to deal with heteroscedasticity of slopes across 
studies, which could occur because of sample size 
differences and differences in precision. They also ar- 
gued that dividing b by its standard error removes 
problems due to use of different scales across stud- 
ies. While summaries of t values have long existed 
(e.g.. Walker and Saw (1978)), there are some draw- 
backs to their use. First and of greatest concern, 
the t contains information on sample size and preci- 
sion as well as effect magnitude. Thus t can become 
large either when the slope itself is large or when 
its standard error is small, which occurs both when 
the sample is large and when there is little varia- 
tion in the regression residuals. Stanley and Jarrell 
argued that the t "is a standardized measure of the 
critical parameter of interest" (2005, page 304), but 
they did not say what the parameter of interest is. 
Clearly t is not an estimator of (3. Also these authors 
do not explain whether one can use a summary of 
the t values to obtain a slope estimate after pooling 
or summarizing the t values. Moreover, it is some- 
times difficult to determine the direction of an effect 
from a t test if a slope is not presented and the test 
is not significant or when the researcher reports only 
the absolute value of the t. Given all of these con- 
cerns, t values are likely to be less meaningful than 
other indices based on slopes when findings are to 
be interpreted. 

While Stanley and Jarrell did not initially de- 
scribe exactly how one would summarize t values. 
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in practice what they and others have often done is 
to model ts or functions of ts in terms of predictors 
that characterize the regression models in their syn- 
thesis. For instance, Card and Krueger (1995) ex- 
amined log \t\ values representing the effects of dif- 
ferent levels of the minimum wage on employment 
rates. They estimated ordinary least squares regres- 
sions for the log|t| values which included as pre- 
dictors the log of the square root of the error de- 
grees of freedom in the primary study, a dummy in- 
dicating whether the data included a subsample of 
teenagers and the number of explanatory variables 
in the primary-study regression model. 

Based on Jarrell and Stanley's recommendation, 
Lau, Sigelman, Heldman and Babbitt (1999) used 
t values in a summary of results from group com- 
parison studies and regression studies that focused 
on the effect of negative political advertisements on 
political campaigns. They found that about one- 
quarter of their data points "come from ordinary 
least squares (OLS) or logistic regression equations, 
and there is no universally accepted method for han- 
dling such data in a meta-analysis" (Lau et al., 1999, 
page 855). To avoid losing data, they extracted t 
statistics associated with regression coefficients that 
represented mean differences between groups exposed 
to negative advertisements and control groups (ex- 
posed to no advertisements or positive advertise- 
ments). They converted the t values into standard- 
ized mean differences {ds), via d = 2t/{dfy/^. They 
argued that the ds obtained via this transformation 
could be combined with other ds from group com- 
parisons. However, to the extent that the primary- 
study regression models included other important 
control variables, these ts likely produced partial ef- 
fect sizes, which do have slightly different distribu- 
tions from "typical" zero-order effect sizes 
(Keef and Roberts (2004)). 

Another index that is related to the t value is 
Timm's (2004) "ubiquitous effect size." This index 
can represent a single slope or a linear combination 
of slopes. Timm's index resolves the problem of de- 
pendence of the t on sample size because it incorpo- 
rates a multiplier that reduces the influence of the 
sample size on its value. However, to date Timm's 
index has not been used in syntheses of slopes and 
Timm did not provide methods for synthesizing his 
index. 



3.3 Iterative Least Squares Regressions 

An iterative GLS approach was proposed by Hanu- 
shek (1974) to summarize slopes representing re- 
turns to schooling. Hanushek's method optimally re- 
quires the raw data in order to estimate a covariance 
matrix among the slopes. However, he suggested 
an alternative approach whereby part of the covari- 
ance matrix could be estimated using OLS regres- 
sion across studies and the estimate obtained from 
this step would then be added to a function of raw 
data from the original (within-study) regressions. To 
the extent that the approach requires raw data and 
infrequently reported summary values from the orig- 
inal studies, it will not be applicable in many meta- 
analytic settings. 

3.4 Dose-Response Models in Epidemiology 

Greenland (1987), Greenland and Longnecker 
(1987) and Shi and Copas (2004) considered slopes 
that relate the amount of exposure to some sub- 
stance to odds-ratio outcomes. Typical studies re- 
late levels of exposure (e.g., to alcohol, to smoke 
as in passive smoking, etc.) to outcomes including 
diagnoses of various kinds of cancer and other dis- 
eases. These studies fit into the regression frame- 
work because researchers want to know whether the 
level of exposure to some substance predicts higher 
levels of problematic outcomes (e.g., higher rates of 
cancer). Some issues are similar to those for con- 
tinuous outcomes, but the outcome metric differs in 
these cases because it is typically a dichotomy (sur- 
vival versus death, presence of some disease versus 
no disease, etc.). In the epidemiology literature typ- 
ical fixed and random-effects syntheses of the dose- 
response slopes have been conducted (weighting by 
the within-study slope variances), and the issue of 
dependence has been addressed by incorporating a 
within-study covariance between odds ratios (at dif- 
ferent exposure levels) into the analyses. This covari- 
ance is different from the covariance between slopes, 
which is incorporated in our methods below. 

Shi and Copas (2004) argued for the use of maxi- 
mum likelihood estimators of the mean dose-response 
slope and a between-studies variance component for 
the slopes, and they also describe a likelihood test 
of homogeneity of the dose-response slopes. Shi and 
Copas considered a bivariate regression because only 
one predictor (exposure to the dosing variable) was 
used in the within-study model. They argued that 
their approach is also approximate for adjusted odds 
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ratios (e.g., adjusted for age or other predictors), 
provided the adjustments are not great. The ad- 
justed odds ratio case is similar to the typical sit- 
uation in most areas of social science, where multi- 
ple control variables are included in each regression 
model. 

3.5 Validity-Generalization Approaches 

A considerable literature exists concerning the syn- 
thesis of test validities (e.g.. Hunter and Schmidt 
(2004)). This area is known as validity generaliza- 
tion, with a key issue being whether test validi- 
ties generalize (i.e., can be applied reasonably well) 
across job types and job settings. Test validities are 
typically indices that represent the relation between 
a predictive test (e.g., an employment selection test) 
and some later outcome such as job performance. A 
key issue in this literature is the effect of differen- 
tial test reliability across studies, thus corrections 
for measurement error are a standard part of the 
validity-generalization approach. While test validi- 
ties are most often represented by correlation coeffi- 
cients, on occasion more complex regression mod- 
els are used to examine test validity. Raju, Pral- 
icx and Steinhaus (1986) estimated the mean slope 
and between-studies variation in slopes with correc- 
tions for unreliability in X, where X is a predic- 
tive test whose validity is of interest. Their meth- 
ods parallel those presented by Hunter and Schmidt 
(2004) for analyzing correlation coefficients, which 
have been controversial in the meta-analysis litera- 
ture (see, e.g.. Hedges (1988)). Even so, they were 
used by Crouch (1995, 1996) and Root and col- 
leagues (2003). Later Raju, Pappas and Williams 
(1989) conducted an empirical Monte Carlo study 
of a validity data base to examine the performance 
of methods using slopes and correlations and covari- 
ances to represent validities. 

3.6 Weighted Least Squares (Univariate) 
Approaches 

The weighted least squares (WLS) approach was 
used by Bini, Coelho and Diniz-Filho (2001), who 
cited Hedges and Olkin (1985) as the basis of their 
approach. Greenland and Longnecker (1987) also de- 
scribed this approach. If we consider the model in 
(1) above relating some Xs to Y for person j in 
study z, we may want an estimate of the slope for 
one predictor, say X\. Estimating model (1) in each 
of k studies (using the same estimation method. 



such as ordinary least squares) produces indepen- 
dent and approximately normally distributed esti- 
mates of the population slopes Pn, P21, ■ ■ ■ , f^ki- If 
we denote those estimates as 611, 621, • • • > ^fci we can 
use least squares methods to summarize the slopes. 
Thus, for instance, we can compute the combined 
slope 6.1, 



(2) 



b.i 



T,i=i'Wiibii 



where k is the number of slopes combined, bn is the 
slope for Xi from study i and wn is the weight for 
that slope in the ith study, which is the reciprocal of 
the slope variance [wn = 1/Var(&ji)]. The variance 
of 6.1 is given as 



(3) 



V{b. 



1 



This approach could also be applied to partial corre- 
lations or standardized regression slopes. If standard 
errors were not available, one could weight by sam- 
ple size, as the relevant standard errors are typically 
a function of n or of the degrees of freedom for the 
regression model. 

3.7 Multivariate Bayesian Approach 

One last proposed method for simultaneously es- 
timating a set of regression models was given by 
Novick and colleagues (Novick, Jackson, Thayer and 
Cole, 1972) in the context of the validity of college- 
admissions prediction, where all predictors are con- 
sistently measured across colleges. Furthering a 
Bayesian method attributed to Lindley, the authors 
argued for a multistage Bayesian formulation involv- 
ing raw data, its parameters (the slopes) and hyper- 
parameters (e.g., the variances of the slopes). How- 
ever, while the method constitutes an improvement 
beyond the methods above because it is multivariate 
and uses simultaneous estimation, this approach re- 
quires full access to the raw data so is not applicable 
in the meta-analytic context. 

4. MULTIVARIATE GLS APPROACH 

Most of the analyses presented above are reason- 
able if one wants to synthesize estimates of a single 
population slope and if most of the studies involving 
that slope examine simple models. However, within 
the ith sample, the P + 1 slopes bio,bii,. . . , bip are 
often correlated and there may be interest in ob- 
taining an overall regression model (rather than a 



8 



B. J. BECKER AND M.-J. WU 



single slope estimate). To synthesize slope vectors 
bi , b2 , . . . , bfc , we need generalized least squares 
(GLS) methods, primarily because of the unequal 
variances of effects for studies of different sizes. [Stan- 
ley and Jarrell argued that one could obtain esti- 
mates of the vector of slopes by solving a system of 
equations with the slopes as endogenous variables 
(Stanley and Jarrell (1989), page 169). However, 
they did not discuss exactly how to do so or how to 
deal with the fact that within each study the slopes 
will be intercorrelated.] An overview of the use of 
GLS for dependent standardized-mean-difference ef- 
fect sizes was given by Raudenbush, Becker and Kala- 
ian (1988), and we apply a similar approach here to 
sets of slopes. 

To use GLS, we need estimates of the P + 1 slopes 
from each of the k samples (this includes the in- 
tercept bio) and their covariance matrices Cov(bj). 
It is also possible to include studies that examine 
subsets of the P predictors; we comment on how 
this would be done as we discuss details of the ap- 
proach. Within sample i, the OLS estimate of (3^ = 
(Ao) fti, • • • ) ftp) is frequently reported. The estima- 
tor is 

hi = {bio, bii, bip) = (X-Xj~^X.Yi, 

with I]j = Cov(bj) = (X^Xj)~^cjj?, where Xj is the 
matrix of predictor values in the ith sample, plus 
a constant if the intercept is included. Typically 
af is not known, but rather is estimated with S*?, 
the mean squared error (MSE) of the regression in 
study i. In large samples, bj is normally distributed 
with mean (3^ and variance S j , which is the basis for 
the GLS approach. We will assume a common fixed- 
effects model (see Hedges and Vevea (1998)) which 
presumes that all samples incorporate the same P 
predictors in the within-study regression model, and 
also assume that the vectors bj estimate a common 
population slope vector (3. 

We stack the k sample slope vectors and make a 
blockwise diagonal matrix of the Cov(bj) matrices, 
then apply GLS estimation. First define 

"bi 
bo 



and 



Cov(bi) 

Cov(b2) 
















Cov(bfc) 



Olkin (2003) pointed out that in some cases the co- 
variance matrices Cov(bj) could be pooled; below 
we discuss the case where a pooled MSE is avail- 
able. 

Then under the assumption that each slope vector 
bj is estimating /3, we have the model 



^10 

bii 
bip 
bko 
bkP. 



: W/3 + e: 



10 
10 
00- -0 
00 1 



10 
1 ••• 
00 1 



73o 
J3p 



+ e. 



The slopes are modeled as a function of /3 (the vec- 
tor of P + 1 population slopes) and a design ma- 
trix W composed of zeros and ones that identify 
which slopes are estimated in each sample. When 
all samples examine the same predictors, a stack of 
(P + 1) X (P + 1) identity matrices serves as W in 
the model b = W/3 + e, with Cov(e) = Cov(b) from 
above. If the samples do not all estimate the same 
model (i.e., some models use fewer than the full set 
of P predictors), we can still use the GLS formu- 
lation to include those results in the synthesis. In 
such cases the component of W that represents a 
sample with fewer than P predictors would not be a 
full identity matrix; row p+1 oi the identity matrix 
for sample i would be omitted if the pth predic- 
tor was not included in study i. However, as men- 
tioned above, the interpretations (and distributions) 
of slopes from reduced models would not be exactly 
the same as for slopes from models with all P predic- 
tors, and estimation of such quantities as af will be 
more complicated because af and af, (from samples 
i and i') may represent different population quanti- 
ties if different sets of predictors were examined in 
samples i and i' . 

It is also possible to modify this approach some- 
what to examine the influence of particular addi- 
tional predictors on a focal predictor's slope. For in- 
stance, suppose the focus was on the role of teacher 
verbal ability as a predictor of student achievement. 
It could be of interest to see whether the slope for 
teacher verbal ability is different when a measure of 
students' prior achievement is included in the model. 
This could be accomplished by adding a column to 
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the W matrix that would contain a 1 in each row 
representing a verbal-abihty slope that came from a 
model that also included prior achievement. A small 
example illustrates this idea. Suppose that Xi is 
teacher verbal ability, X2 is prior achievement and 
X3 is socioeconomic status. Two studies are avail- 
able, only one of which (say study 1) includes prior 
achievement. The GLS model would be 
ho 



and 



b 



bii 
bi2 
bi3 

b2i 

b23 

1 

10 1 

10 

1 

1 
10 
1 



: W/3 + e 



Pi 
P2 

71 



+ e. 



The last column of W contains a 1 in row 2 (the 
row for the verbal-ability slope in study 1), showing 
that the first study included prior achievement {X2 ) 
in the study-level model. Row 6 — the row for the 
verbal-ability slope in study 2 — does not have a 1 in 
the last column because study 2 did not include X2 ■ 
Also study 2 does not have a row of W with a 1 in 
column 2 because there is no estimate of P22- 

This model contains a fifth parameter, denoted 71 
in the display, that represents the difference in the 
slope of teacher verbal ability when prior achieve- 
ment is controlled. From this model we can deter- 
mine that for study 1, the expected value of 611 is 
/?! +71, while for study 2, i?[62i] = Pi- By including 
additional columns for key control variables or for 
other study features, the meta-analyst can examine 
hypotheses about whether the focal slope is affected 
by those elements of the original studies and their 
regression models. The details of such tests are de- 
scribed below. We caution, however, that including 
large numbers of added predictors may lead to mul- 
ticollinearity, thus meta-analysts would be wise to 
carefully examine their models for the presence of 
this problem. 

Regardless of the components of W and (3, we 
estimate (3 and its covariance as 

0* = (W'S~iW)"iW'S"ib 



Cov(^*) = (W'S-^W)- 



Often, as noted above, we do not know Cov(b) = SI; 
thus we typically substitute an estimate, which we 
shall denote V, and compute instead 



(4) 
and 

(5) 



= (W'V-^W)~^W'V"^b 



Cov0) = iw'v~^wy\ 



With large samples and under typical regularity con- 
ditions, 

/3~N(/3,Cov(/3)); 
thus confidence intervals for each element of /3 are 



available, using /3„ it Zi 



Kjpp, where Zi_a/2 is 



the upper tail 1 — q/2 critical value of the standard 
normal distribution and Cpp is the pth diagonal ele- 
ment of the Cov(/3) matrix, the variance of /3. Also 
a test of the hypothesis that the pth. slope /3p = 
can be obtained via 

Pp 



which is a standard normal deviate under the null 
hypothesis that Pp = 0. The value of Z is compared 
to the cutpoints of the standard normal distribution. 

Several other tests are available as well. A test of 
model fit, which is essentially a test of homogeneity 
of the regression intercepts and slopes across sam- 
ples and across predictors, is given by 

QE = (b-W/3)'V-i(b-W/3), 

which has a large-sample chi-squared distribution 
with {k — 1){P + 1) degrees of freedom if all slopes 
and intercepts are included. If F additional columns 
are added to W to represent study features, then 
the degrees of freedom will be (k — 1){P + F + 1) . 
If the magnitudes of the intercepts are not of inter- 
est, a modified Qe test can also be computed by 
including only the predictor slopes, thus reducing 
the dimension of W and including only those val- 
ues of interest in b,/3 and V^^. In that case, Qe 
is chi-squared with (k — 1)P degrees of freedom. If 
Qe is large relative to cutpoints of the appropri- 
ate chi-squared distribution, the slopes vary beyond 
what one would expect to see given only sampling 
variability. 
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A test of tlie composite liypotliesis that /3 = is 
given by 

Qb=0'Cov0)$, 

which is chi-squared with P + 1 degrees of freedom 
under the null hypothesis that /3 = or with P de- 
grees of freedom if only predictor slopes are included 
(see, e.g., Hedges and Olkin (1985)). 

4.1 Special Cases of the GLS Approach 

The problem with the approach just described its 
that it is extremely rare to find the full covariance 
matrix of the slopes Cov(b) reported in a primary 
research study. Thus it is useful to note that the 
estimator shown in (4) simplifies to the weighted 
least squares (WLS) univariate estimator given in 
(2) if the off-diagonals in Cov(b) or V are set equal 
to zero. 

Another special case is one in which it is possi- 
ble to pool the estimates of af across studies. If all 
studies examine the same model and separate es- 
timates of af are available, then it is possible to 
remove the MSE values from the Cov(b) matrices 
and use a blockwise diagonal matrix X* containing 
the (X^Xj)~^ matrices in place of V in formulas (4) 
and (5). It is shown in the Appendix that 

(6) $ = (W'(X*)-i W)-^W'(X*)~ib 

produces an estimate of /3 equivalent to the value 
that would be obtained from a pooled sample. This 
is because (X*)^^ is a blockwise diagonal matrix 
containing the values of X-Xj, and the product 
W'(X*)~^W sums the X^Xj values across the k 
studies. Similarly (X*)~^b equals the sum across 
studies of the values of the products X-Yj, lead- 
ing to equivalence with the estimator based on the 
pooled sample. 

Values of (X^XJ"^ can be estimated if each study 
reports the covariance matrix for the slopes and Sf, 
the estimate of af (or other quantities that allow 
computation of S?, e.g., the variance of the out- 
come and the for the regression) . Each element of 
Cov(b) is divided by the estimated MSE: (X^XJ^^ = 
Gov (bi) 5^2. This method requires that a pooled 
value of the MSE (say S^) be obtained and mul- 
tiplied by the covariance of the synthesized slope 
estimator computed using X*, to compensate for 
being removed when X* is substituted for V. 
Thus the matrix of covariances among the synthe- 
sized slopes is 

(7) Cov{0) = (W'{X*)-'W)-'Sl 



One possible estimator 5^ could be 

i i 

where dfe^ is the degrees of freedom for error in 
study i. Unfortunately primary researchers do not 
always report the value of Sf, the mean squared 
error of the regression model in the primary study. 
Given this and the rarity of finding full Gov(bj) ma- 
trices, it is expected that this special case will be 
relatively uncommon. 

4.2 Limitations 

The discussion of special cases focuses our atten- 
tion to the fact that one weakness of the proposed 
GLS approach is that it uses the Gov(bj) matri- 
ces that are rarely reported. It is unlikely, even with 
more stringent reporting requirements, that authors 
will routinely begin to report these matrices, partic- 
ularly in primary research studies where many mod- 
els with large numbers of predictors are estimated 
and compared. 

There are two possible approaches to this prob- 
lem. One is to simply assume the slopes are indepen- 
dent, use the squared standard errors of the slopes 
as the diagonal elements of Cov(bj) and set the off- 
diagonal elements to zero. This produces weighted 
least squares estimates. A slightly more conservative 
approach would be to assume a common correla- 
tion value among all slopes [e.g., Corr(6ip, bip') = 0.2] 
and then compute the off-diagonal elements of each 
Gov(bj) matrix as the product of the slope standard 
errors (SEs) and that common correlation, specifi- 
cally, Cov{bip,bip') = CoTT{bip,bip>)*SE{bip)*SE{bip>). 

One final point regarding this issue relates to model 
specification in the primary research studies in the 
meta-analysis. That is, if a model is well specified in 
study i, there should be no serious multicollinearity 
and the degree of covariation among the slopes in 
Gov(bj) should not be great. In such cases, setting 
all off-diagonal elements of Cov(bj) to zero would 
not have serious consequences. However, reporting 
conventions in many fields do not require authors to 
mention whether multicollinearity was assessed or to 
report on multicollinearity diagnostics. So the meta- 
analyst must trust that the primary study authors 
actually checked for multicollinearity and that any 
models reported upon are relatively free from this 
problem. 
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5. EXAMPLE 

In this example we use data from the base year of 
the National Education Longitudinal Study of 1988 
(NELS:88). NELS:88 is a survey of a national sample 
of high-school students from over 1000 schools. The 
same measures are used across schools and when 
analyzed with proper weights, the full sample repre- 
sents the U.S. grade 10 high-school population from 
1988. In our example we use as "studies" 13 schools 
with samples of more than 45 students; we do not 
use the NELS:88 sampling weights that would pro- 
duce results that reflect the national population. 
Both the school-level sampling weights and the 
within-school weights that could be used to make 
each school's estimates reflective of the population 
of that school were ignored. 

5.1 Model 

Our regression model uses three of the standard- 
ized cognitive tests administered as part of the 
NELS:88 survey — the science, mathematics and 
reading scales. This model views science achieve- 
ment as a function of math and reading test perfor- 
mance. Specifically, Y represents the NELS:88 sci- 
ence achievement test, Xi is the mathematics test 
and X2 is the reading test. The model estimated in 
study i is 

Yij = Po + PiXij + P2X2j + eij 

for student j, with error eij. We use ordinary least 
squares to obtain school level estimates of this model. 
Computations were done using PROC REG and 
PROC IML in SAS. 

Our analyses are based on the item-response-theory 
estimated number-right scores for these test batter- 
ies; therefore the raw slopes can be interpreted as 
the predicted change in the science test score for 
a one item increase in the math or reading test 
score. The science test had 25 items, the math test 
had 40 and the reading test had 21 items. Means 
across the 13 schools were 13.5 or 54% correct on 
science {SD = 5.7), 24.4 or 61% correct on math 
{SD = 10.4) and 13.9 or 66% correct for reading 
{SD = 5.7). The correlation between math and read- 
ing scores was tmr = 0.70, and each predictor was 
also correlated with the outcome at about that same 
level {rjifs = 0.70, rjis = 0.67) in the full sample. 

5.2 Results 

The regression model with Xi and X2 as predic- 
tors of Y was estimated within each of the schools, 
and the slope estimates and fitted models are shown 
in Table 1. The data from the 13 schools were also 



Table 1 

Fitted regressions and MSB values for full sample and 13 
schools 



Sample 


rii 


Fitted regression 


MSB 
{S'f for schooli) 


Full 


664 


2.552 + 0.245X1 +0.358X2 


14.44 


1 


64 


5.470 + 0.219X1 + O.26OX2 


17.46 


2 


59 


s RQi + n 246X1 4- 270X9 


14.24 


3 


67 


5.619 + 0.040X1 + 0.638X2 


14.05 


4 


45 


4.381 +O.I8IX1 +0.392X2 


10.75 


5 


47 


4.305 + O.26OX1 + 0.282X2 


9.32 


6 


45 


2.346 + 0.185Xi +0.195X2 


14.60 


7 


45 


0.228 + 0.283Xi +0.339X2 


9.80 


8 


56 


2.289 + 0.289Xi +O.312X2 


13.32 


9 


45 


3.600 + 0.248Xi + 0.263X2 


12.65 


10 


51 


2.156 + 0.192X1 +0.498X2 


6.50 


11 


48 


3.621+0.133X1 +0.413X2 


11.02 


12 


45 


3.144 + 0.250X1 +0.382X2 


17.65 


13 


47 


3.781 +0.251X1 +0.151X2 


13.20 



pooled (used as a single sample) and the full model 
including intercepts was estimated across all schools 
(for all 664 cases); this result is labeled "Full sam- 
ple." The estimated model from this analysis of the 
13 schools together was Yj = 2.552 + 0.245Xij + 
O.358X2J and it is shown in the first row of Table 1. 
(The subscript j has been omitted from the table en- 
tries for simplicity.) Inspection of the models for the 
13 schools shows some variation in the slopes and 
intercepts; the most unusual looking model is for 
school 3. Also casual inspection of the mean squared 
errors shows some variation in the Sf values, with 
school 10 showing the smallest value. However, Lev- 
ene's test suggests the error variances are not dif- 
ferent [F(12,651) = 1.25, p = 0.25], indicating that 
it is reasonable to proceed with the analysis based 
on the pooled MSE. (Although here we have the 
raw data and can compute Levene's test, in prac- 
tice other tests that do not require raw data such as 
-Pmax or Cochran's C could be used to test residual 
variance equality.) 

The upper triangles of the covariance, correlation 
and X^Xj matrices among the slopes for three of 
the schools and the full sample are shown in Table 
2. The X^X^ matrices are used in the third method 
of estimation using the pooled MSE. The elements 
of the Cov(bi) matrices are obtained as the products 
of the entries in (X-Xj)~^ times the MSE [e.g., for 
school 1, the first entry in Cov(bi) is 1.934, which 
is within rounding error of 17.463 x 0.1107 = 1.933]. 
Also the MSE pooled across the 13 schools is Sl = 
12.83. 
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Table 3 repeats the OLS results for the pooled 
sample (to facilitate comparisons) and also presents 
the slopes estimated using the three synthesis meth- 
ods described above. The first set of results is based 
on the GLS estimation method with mean and vari- 
ance given in (4) and (5). While the intercept dif- 
fers somewhat from the pooled-sample intercept, the 
slope coefficients are both within 0.015 of the values 
estimated in the full sample. Considering that the 
slopes represent predicted change on a 25-point sci- 
ence test (given a one-point change on X), these are 
very small differences. The test of homogeneity of 
the models using Qe defined above for all slopes and 



intercepts shows that indeed the slopes and inter- 
cepts are not homogeneous (Qe = 114.16, df = 36, 
p < 0.001), and may not have come from a single 
population. However, this test asks whether all pa- 
rameters are equal across schools; thus the test can 
also be large if the intercepts differ. The test can 
be computed for the predictor slopes only (omit- 
ting bo values): when this is done, the Qe value is 
smaller (Qe = 21.74, df = 24, p = 0.59), and indi- 
cates the math and reading slopes are homogeneous 
across schools. Also at least one of the slopes differs 
from zero, according to the Qb test (Qb = 518.16, 
df = 2, p<0.001). 



Table 2 

Covariance and X'X matrices for three studies and full sample 



Sample 






X'X 






Gov (b) 




Corr (b) 






I 


M 


R 


I 


M 


R 


M 


R 


Full 


I 


0.01175 


-0.00018 


-0.00042 


0.1697 


-0.0026 


-0.0060 


-0.32 


-0.40 


(n = 664) 


M 




0.00003 


-0.00003 




0.0014 


-0.0005 




-0.70 




R 






0.00009 






0.0013 






School 1 


I 


0.1107 


-0.0037 


-0.0017 


1.9340 


-0.0648 


-0.0302 


-0.61 


-0.22 


(m = 64) 


M 




0.0003 


-0.0002 




0.0058 


-0.0043 




-0.57 




R 






0.0006 






0.0098 






School 2 


I 


0.0914 


-0.0015 


-0.0034 


1.3018 


-0.0218 


-0.0482 


-0.36 


-0.44 


(n2 = 59) 


M 




0.0002 


-0.0002 




0.0028 


-0.0030 




-0.60 




R 






0.0006 






0.0092 






School 3 


I 


0.4267 


-0.0103 


-0.0058 


5.9953 


-0.1449 


-0.0817 


-0.65 


-0.26 


{na = 67) 


M 




0.0006 


-0.0004 




0.0082 


-0.0063 




-0.54 




R 






0.0012 






0.0164 







Table 3 
Results of synthesis 



Method of Slope estimates 



estimation 


Intercept 


Math 


Reading 






Cov (b) 




Corr (b) 












I 


M 


R 




Full sample 


2.552 


0.245 


0.358 


I 


0.1697 


-0.0026 


-0.0060 


-0.32 -0.40 


(n = 664) 








M 




0.0004 


-0.0005 


-0.70 










R 






0.0013 




GLS 


2.268 


0.247 


0.373 


I 


0.1463 


-0.0021 


-0.0054 


-0.30 -0.41 










M 




0.0003 


-0.0004 


-0.71 










R 






0.0012 




WLS 


2.936 


0.221 


0.343 


I 


0.1747 



















M 




0.0004 
















R 






0.0012 




GLS using (X'X)"i 


2.552 


0.245 


0.358 


I 


0.1507 


-0.0023 


-0.0053 


-0.32 -0.40 










M 




0.0004 


-0.0004 


-0.70 










R 






0.0012 
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At this point more detailed analyses of slopes for 
each predictor might be of use, and standard uni- 
variate meta-analysis procedures (e.g., 
Hedges and Olkin (1985)) could be applied to each 
set of slope values, or other GLS based analyses 
can be used if it is desired to model the vectors 
of slopes (Raudenbush, Becker and Kalaian, 1988). 
Also to explore between-studies differences in mod- 
els one could then examine moderating variables as 
described above. If the slopes or parameters for ad- 
ditional study features still did not appear homoge- 
neous, one could estimate between-studies variance 
components for each of the slopes. A variety of esti- 
mators for the between-studies variance exist (e.g.. 
Hedges and Olkin (1985); Sidikand Jonkman, 2005) 
and an estimated between-studies variance could 
then be added to each study's sampling variance to 
augment its uncertainty. 

The next set of results was obtained by eliminat- 
ing the off-diagonal elements from the Cov(b) ma- 
trices. This is equivalent to estimating the slopes 
using the univariate methods shown in displays (2) 
and (3). These values also do not deviate far from 
the full-sample values; both slopes are within 0.025 
points of the slopes from the full sample — deviating 
only slightly more than the GLS values. This is in 
spite of the fact that the predictors and outcome 
show moderate intercorrelations as can be seen by 
inspection of the Corr(b) matrices shown in Table 
2. Finally, the third set of results is computed us- 
ing the X* matrix in place of the Cov(b) matrix, 
and the pooled MSE in place of each Sf value. As 
noted above the slope computed in this way is iden- 
tical to the slope for the full sample, and the covari- 
ance matrix differs from the full sample matrix by 
a constant factor equal to the ratio of the estimated 
pooled MSE to the full sample MSE (here that ratio 
is 12.83/14.44 = 0.89). It is somewhat problematic 
that the variances of slopes from the meta-analysis 
are less than or equal to the values from the full 
sample (thus suggesting more precision). From one 
application it is not possible to determine whether 
this is a result of the particular nature of the exam- 
ple data (11 of 13 schools show MSEs smaller than 
the MSE of 14.44 for the full data set) or something 
more pervasive. Further examination of the perfor- 
mance of these estimation methods via Monte Carlo 
methods will indicate whether a consistent pattern 
of underestimation is found. 

Our new method takes into account the interrela- 
tionships among predictors from the primary stud- 
ies, as well as heteroscedasticity of the slopes, via 



the variance-covariance matrix of the slopes. Both 
features should represent improvements on ordinary 
least squares methods. Such OLS approaches typ- 
ically include dummy variables to show the pres- 
ence of specific predictors or study features, but do 
not deal with the possible dependence of the pre- 
dictors in the model(s), nor do they account for the 
heteroscedasticity inherent in the slope estimates. 
Even when off-diagonal elements of Cov(b) were set 
to zero in our analysis, the weighted least squares 
slopes were very close to the full sample slopes. 

6. CONCLUSION 

This paper presents a review of existing methods 
for the synthesis of regression slopes and a new mul- 
tivariate approach based on generalized least squares 
estimation that is applicable to the meta-analytic 
context. Table 4 summarizes the main strengths and 
weaknesses of all of the methods. Two methods re- 
quire raw data and thus are not appropriate for the 
meta-analysis context. Five others focus only on a 
single focal slope (or some related index such as a t 
test of that slope) and thus cannot provide an over- 
all model based on the synthesis. Also these meth- 
ods ignore dependence among slopes by omitting all 
but the focal slope. Some additionally ignore the in- 
herent differential precision of slopes across studies 
by applying ordinary least squares estimation meth- 
ods. The new multivariate GLS method addresses 
these problems, but is itself limited because infor- 
mation about covariation among slopes is typically 
not given in primary research reports. 

A comparison of the results of three variations of 
the GLS approach applied to an educational data set 
is made to the analysis of all data in a single pooled 
analysis. The analyses produce very similar results 
and in some cases have identical results (given the 
availability of specific summary statistics such as 
mean squared errors for the individual regression 
models). Our results emphasize the importance of 
full reporting of sufficient statistics in primary re- 
search studies; with less complete information, the 
full GLS analysis is not possible. However, even the 
less complex weighted least squares approach ap- 
peared to provide reasonable values in one example 
analysis. 

APPENDIX: EQUIVALENCE OF FULL 
SAMPLE AND SYNTHESIZED RESULTS 
WHEN cr? = 0-2 FOR i = 1 TO fc 

Consider k independent samples or studies each 
examining a model relating predictors Xi through 
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Xp to an outcome Y for case j. Specifically, in study If it is reasonable to assume that the error variances 
i, af, for z = 1 to /c are equal (e.g., if the k samples are 

drawn from one population), then we have 



Yij = Pio + PiiXiji + h PipXijp + Bij 

for j = 1 to m. 

For later use we also define X and Y by stacking 



the individual X,- and Y,- matrices: 



X: 



Xi 

X2 

X. 



and Y : 



Yi 

Y2 



The OLS regression slope for the full combined sam- 
ple is 



(Al) 



b* = (X'X)-^X'Y 



Within study i, the OLS estimate of /3j = (/3iO;Ai) 
• • ■ , Ap) is 



b, = (6io, 6a, ... , b,p) = (X^XJ-^X^Y, 



with 



Cov(b,) = (X^Xj-V2. 



Next we define 



Cov(b,) = (X^XJ-V2. 



bi 
b2 



and 



Gov (bi 








Cov(b2) 











Cov(bfc) 

(X'iXi)-i 

(X^Xa)-! 

•• 

(x',x,; 



Table 4 
Methods of summarizing slopes 



Method 



Data needed 



Strength 



Weakness 



Simple slope summaries 
Summaries of t statistics 



Iterative least squares 

approach 
Dose-response models 

(WLS approach for 

dichotomous 

outcomes) 
Validity generalization 

approach 
Univariate WLS approach 

Multivariate Bayesian 
approach 

Multivariate GLS 
approach 



Slopes 

t values for slope tests 
Raw data 



Simple, little data needed 

Simple; little data needed; 
Xs and Ys can be on 
any scales 



Accounts for covariation 
among predictors 
Slopes and standard errors Weights by precision 
for models with 
dichotomous outcomes 



Slopes, reliabilities of X 

and sample sizes 
Slopes and standard errors 

Raw data 



Slopes and Gov (b) 
matrices 



Simple; little data needed 

Relatively simple; weights 

by precision 
Collateral information can 

be shared across 

studies 
Weights by precision; 

accounts for 

covariation; provides 

entire pooled model 



Focuses on a single focal slope; ignores 
dependence and precision of slopes 

Focuses on only a single focal slope; t 
values contain irrelevant information 
about sample size; unclear how 
an index of effect is obtained 

Iteration needed to get covariance 
matrix 

Focuses on only a single focal slope; 
ignores dependence of slopes 



Reliabilities often not reported; 

ignores dependence of slopes 
Focuses on only a single focal slope; 

ignores dependence of slopes 
Multistage formulation; requires priors 

and hyperparameters; Xs and Ys 

must be on same scales 
Requires covariances among slopes, 

which are often not reported 
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which is labeled X*cr^ in the text. When inverted, 
this matrix is 




















(X'2X2) 


























(xi.x,)J 




Also the ith cross-product matrix 


is 






m 


j 










J 




X*X, = 


j 


j 


j 








"^XijiXijp 




Xijp 




- j 


j 


i 





E ^ijP 



The synthesized GLS slope estimator is 



(A2) 



/3* = (W'S^iW)-^W'S-^b, 



where W is a stack of identity matrices of dimen- 
sion P + 1. The first component of the estimator is 
(W'S^^W), which is a sum of matrices: 



(W'S-^W) : 
Equivalently, 

E"» 

i 

EE^'^i 



(X;Xi)a2 + (X^X2)ct2 + 
••• + (X',X,V2. 



i j 

E E 



EE ^'^2 



EE-^ 



ij2Xijp 



EE^^ 



2 



which is simply X'Xcr^ for the full sample (i.e., the 
sample pooled across studies), so W'S^^W = 
(X'X)~"'^cj~^. Thus we can write 



W'S^^b. 



(A3) (3* = [(X'X) 

Next we consider the term W'5]~^b. The product 
W'Xl"^ is a matrix that is o"^ times a concatenation 
of (X-XJ matrices, specifically 



I I 

^ = [X^Xi|X2X2| • • -X-Xj • • • |X^X;,]cj^. 

Also, b is the stacked vector of the k individual sam- 
ple slope vectors. Thus 

W'E^^b = X'lX^bicr^ + X^XsbsCJ^ + 

hXfcXfcb^cT^. 

Each component of this sum is a (P + 1) x (P + 1) 
matrix. Then substituting bj = (X-Xj)~"'^X-Yj into 
this equation, we obtain 

w's-^b = x;xi(x;xi)~ix'iYicj2 + 

l"XfcXfc(X'^.Xfc) -'^XfcY^jCr^ 



a 



: X'Y. 



Substituting this result into (A3), we see that 

/3* = (X'X)"V-2w'S"ib = (X'X)"V~2[^2x/Y] 

= (X'X)"iX'Y, 
which equals b* given in (Al). 
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